iNZight, Surveys, and the IDI

Tom Elliott

Te Rourou Tātaritanga
Victoria University of Wellington

tomelliott.co.nz

Updates

PhD thesis

  • submitted April (during L4)
  • defended August (during L4)
  • graduation tomorrow!

PhD thesis (TL;DR)

  • predicting buses is hard
  • using real-time traffic data from other buses doesn’t help point estimates much …
  • … but interval estimates are actually reliable!
  • quantities like “Pr(catch bus | arrive at)”
  • useful for probabilistic journey planning

If you’re interested … https://tomelliott.co.nz/phd

Postdoc @ VUW @ UoA

  • MBIE Endeavour grant
    • Colin Simpson (VUW), Barry Milne (COMPASS), Andrew Sporle
    • Informatics for Social Services and Wellbeing …
    • more later!
  • Honorary position here (thanks James)

iNZight

iNZight main window

  • my side-project since 2013/14

  • shifting focus as audience has evolved

    • pre 2015: school/some university

    • 2015–2019: education (school/university/FutureLearn), and cropping up in unexpected places (around the world)

    • recently:

      • Democratisation

        See Chris Wild’s talks featuring hits like We Will Plot You)

      • rapid research development tools

        (Andrew Sporle) for organisations with low money/time/both

  • recent focus on surveys — now handled natively!

    • plots
    • summaries (tables of counts)
    • inference / modelling
    • data wrangling …
  • key goal is removal of barriers

Surveys and iNZight

Data

GUI

Explore

Save output / script

What if data is from a survey?

In R

iNZight isn’t much better … or is it?!

Specify survey design

(Remember survey variables never have nice names)

mysurvey.zip

  • mysurvey.csv
  • mysyrvey.svydesign

Demo

iNZight main window

mysurvey.svydesign

data = "mysurvey.csv"
weights = "wt0"
repweights = "^w[0-4]"
reptype = "JK1"
  • User doesn’t have to know about the underlying survey design

  • Researchers can quickly open and explore a (survey) data set

  • Everything is taken care of

    • plots (dotplots become histograms, scatter plots become bubble plots or hexbin plots)

    • summary tables give population counts (plus errors)

    • data wrangling functions use the correct methods

      e.g., survey::subset() for filtering

(A few) Details

iNZight’s R package collection

  • iNZight is not just a single R package

  • collection of 9+ ’iNZight*’ packages with specific tasks

    • ‘iNZightPlots’ makes graphs

    • ‘iNZightTools’ provides a suite of utility functions (data wrangling)

  • main GUI package provides interface and collects user inputs (and displays results)

  • wrapper functions make programming GUIs much easier — just a case of mapping inputs to arguments

  • … and allow us to return the behind-the-scenes R code!

An example: Filtering data

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa
## [1] "iris %>% dplyr::filter(Sepal.Width < 100)"
  • recent work involved modifying wrapper functions to handle surveys

  • the GUI just needs to pass around a ‘data-thing’ (either data or survey)

## [1] "dclus2 %>% srvyr::as_survey() %>% srvyr::filter(api99 >= 700)"

Big thanks to the ‘srvyr’ package!

Te Rourou Tātaritanga

How does this all relate to my postdoc?

Rourou = basket

Nā tō rourou, nā taku rourou, ka ora ai te iwi.

(With your food basket and my food basket the people will thrive.)

Tātaritanga = analysis

“Tools for analytics and sharing data for the betterment of communities.”


Or: “Informatics for Social Services and Wellbeing”

Primary goals

  1. Improve data standards

  2. Promote Māori data sovereignty

  3. Develop systems to support access

  4. Evaluate synthesising of datasets

  5. Security and privacy implications

  6. Machine learning and AI methods

https://terourou.org

Primary goals

  1. Improve data standards

  2. Promote Māori data sovereignty

  3. Develop systems to support access

  4. Evaluate synthesising of datasets

  5. Security and privacy implications

  6. Machine learning and AI methods

https://terourou.org

The Integrated Data Infrastructure (IDI)

  • database connecting data across NZs sectors

  • high security environment

  • but also other unnecessary barriers: coding!

iNZight to the rescue!

  • many upcoming researchers will have used iNZight at high school or university

  • no need to learn to code, OR remember how to do things you haven’t done in 2 years

  • currently working on deploying a demo of iNZight in the Stats NZ data lab — watch this space!

    • intial goal: confine to small datasets
    • primary researcher can prepare using SQL to select/join data
    • other researchers (without great coding skills) can easily explore the data — graphs, tables, models!
    • offers a restricted set of methods which can help prevent novices from running really-big-queries and causing havoc on the servers
    • and build from there!

Outside the data lab

  • lots of data outside the datalab

  • many iwi groups, pacific nations, etc. have specific needs for simple (to complex) population summaries/demographic outputs

  • iNZight means they can do it every 1–2 years without needing to train/retrain/pay expensive statisticians

  • iNZight also produces code: generate script to re-run/edit as necessary (without having to do all the hard stuff first)

Bayesian demography

  • why limit yourself to tables when you can fit hierarchical Bayesian models with model-specific priors, likelihoods, … ?

  • John Bryant has a set of R packages (dembase, demest, …) for doing Bayesian demography

  • using them is a bit of a challenge (especially if you don’t do much R coding!)

  • so we tested out iNZight’s new add-on system …

DEMO

Other projects

Both work and ‘fun’

IDI Search App

  • to get access to the IDI, you need to put together a research proposal

  • putting together a research proposal requires knowing what data is available to investigate

  • that data is hidden away in the IDI

IDI Search app

  • we put together a simple web app providing a searchable database so prospective (and current) IDI researchers can explore what’s available

  • build using ReactJS

DEMO

https://idi-search.web.app/

Bus display v2

  • the display in 302 was broken

  • so I rebuilt it again, this time using ReactJS + d3

  • simpler than the last version (no ‘history’ as it just uses real-time data, no backing server)

DEMO

https://tomelliott.co.nz/bus-display/

Lots of ReactJS …

  • it’s my goal to, one day, put together a prototype of a new version of iNZight using ReactJS and R-serve

  • one version that runs on Windows / macOS / Linux / web

  • plus capability of having a local R server, remote R server - firewall, etc.

NO DEMO

Thank you

Github: tmelliott | iNZightVIT | terourou

Twitter: @tomelliottnz | @iNZightUoA | @terourou

tomelliott.co.nz | inzight.nz | terourou.org

References